The Binomial Distribution

One of the simplest and most common examples of a random phenomenon is a coin flip: an event that is either “yes” or “no” with some probability. Here you’ll learn about the binomial distribution, which describes the behavior of a combination of yes/no trials and how to predict and simulate its behavior.

Simulating Coin Flips

In these exercises, you’ll practice using the rbinom() function, which generates random “flips” that are either 1 (“heads”) or 0 (“tails”).

Instructions

*With one line of code, simulate 10 coin flips, each with a 30% chance of coming up 1 (“heads”).

Solution

# Generate 10 separate random flips with probability .3
rbinom(10,1,.3)

FALSE  [1] 1 0 0 0 1 1 0 0 0 0

Simulating Draws from a Binomial

In the last exercise, you simulated 10 separate coin flips, each with a 30% chance of heads. Thus, with rbinom(10, 1, .3) you ended up with 10 outcomes that were either 0 (“tails”) or 1 (“heads”).

But by changing the second argument of rbinom() (currently 1), you can flip multiple coins within each draw. Thus, each outcome will end up being a number between 0 and 10, showing the number of flips that were heads in that trial.

Instructions

Use the rbinom() function to simulate 100 separate occurrences of flipping 10 coins, where each coin has a 30% chance of coming up heads.

Solution

# Generate 100 occurrences of flipping 10 coins, each with 30% probability
rbinom(100,10,.3)

FALSE   [1] 2 2 0 4 4 5 3 2 3 2 2 3 4 4 3 4 6 1 2 1 1 3 3 2 4 3 2 3 3 2 3 4 2 2 4
FALSE  [36] 0 2 5 2 2 6 3 1 3 1 2 1 2 3 2 3 2 1 2 5 3 4 3 2 3 3 6 3 2 1 3 1 5 2 2
FALSE  [71] 4 5 0 3 1 3 3 4 0 3 3 4 3 4 5 5 4 1 1 4 1 4 2 4 3 4 3 2 1 2

Calculating Density of a Binomial

If you flip 10 coins each with a 30% probability of coming up heads, what is the probability exactly 2 of them are heads?

Instructions

Answer the above question using the dbinom() function. This function takes almost the same arguments as rbinom(). The second and third arguments are size and prob, but now the first argument is x instead of n. Use x to specify where you want to evaluate the binomial density.

Confirm your answer using the rbinom() function by creating a simulation of 10,000 trials. Put this all on one line by wrapping the mean() function around the rbinom() function.

Solution

# Calculate the probability that 2 are heads using dbinom
dbinom(2,10,.3)

FALSE [1] 0.2334744

# Confirm your answer with a simulation using rbinom
mean(rbinom(10000,10,.3)==2)

FALSE [1] 0.2391

Calculating Cumulative Density of a Binomial

If you flip ten coins that each have a 30% probability of heads, what is the probability at least five are heads?

Instructions

Answer the above question using the pbinom() function. (Note that you can compute the probability that the number of heads is less than or equal to 4, then take 1 - that probability).

Confirm your answer with a simulation of 10,000 trials by finding the number of trials that result in 5 or more heads.

Solution

# Calculate the probability that at least five coins are heads
1-pbinom(4,10,.3)

FALSE [1] 0.1502683

# Confirm your answer with a simulation of 10,000 trials
mean(rbinom(10000,10,.3)>=5)

FALSE [1] 0.1497

Varying the Number of Trials

In the last exercise you tried flipping ten coins with a 30% probability of heads to find the probability *at least five are heads. You found that the exact answer was ‘1 - pbinom(4, 10, .3)’ = 0.1502683, then confirmed with 10,000 simulated trials.

Did you need all 10,000 trials to get an accurate answer? Would your answer have been more accurate with more trials?

Instructions

Try answering this question with simulations of 100, 1,000, 10,000, 100,000 trials, so you can see which is the closest to the exact answer.

Solution

# Here is how you computed the answer in the last problem
mean(rbinom(10000, 10, .3) >= 5)

FALSE [1] 0.1437

# Try now with 100, 1000, 10,000, and 100,000 trials
mean(rbinom(100, 10, .3) >= 5)

FALSE [1] 0.15

mean(rbinom(1000, 10, .3) >= 5)

FALSE [1] 0.156

mean(rbinom(10000, 10, .3) >= 5)

FALSE [1] 0.1501

mean(rbinom(100000, 10, .3) >= 5)

FALSE [1] 0.14969

Calculating the Expected Value

What is the expected value of a binomial distribution where 25 coins are flipped, each having a 30% chance of heads?

Instructions

Calculate this using the exact formula you learned in the lecture: the expected value of the binomial is size * p. Print this result to the screen.

Confirm with a simulation of 10,000 draws from the binomial.

Solution

# Calculate the expected value using the exact formula
25*.3

FALSE [1] 7.5

# Confirm with a simulation using rbinom
mean(rbinom(10000,25,.3))

FALSE [1] 7.5268

Calculating the Variance

What is the variance of a binomial distribution where 25 coins are flipped, each having a 30% chance of heads?

Instructions

Calculate this using the exact formula you learned in the lecture: the variance of the binomial is size * p * (1 - p). Print this result to the screen.

Confirm with a simulation of 10,000 trials.

Solution

# Calculate the variance using the exact formula
25*.3*(1-.3)

FALSE [1] 5.25

# Confirm with a simulation using rbinom
var(rbinom(10000,25,.3))

FALSE [1] 5.278581

Laws of Probability

In this chapter you’ll learn to combine multiple probabilities, such as the probability two events both happen or that at least one happens, and confirm each with random simulations. You’ll also learn some of the properties of adding and multiplying random variables.

Solving for Probability of A and B

If events A and B are independent, and A has a 40% chance of happening, and event B has a 20% chance of happening, what is the probability they will both happen?

Hint: To find the probability independent events A and B both happen, multiply their probabilities.

Simulating the Probability of A and B

You can also use simulation to estimate the probability of two events both happening.

Instructions

Randomly simulate 100,000 flips of coin A, each of which has a 40% chance of being heads. Save this as a variable A.

Randomly simulate 100,000 flips of coin B, each of which has a 20% chance of being heads. Save this as a variable B.

Use the “and” operator (&) to combine the variables A and B to estimate the probability that both A and B are heads.

Solution

# Simulate 100,000 flips of a coin with a 40% chance of heads
A <- rbinom(100000, 1, .4)

# Simulate 100,000 flips of a coin with a 20% chance of heads
B <- rbinom(100000, 1, .2)

# Estimate the probability both A and B are heads
mean(A & B)

FALSE [1] 0.07967

Simulating the Probability of A, B, and C

Randomly simulate 100,000 flips of A (40% chance), B (20% chance), and C (70% chance). What fraction of the time do all three coins come up heads?

Instructions

You’ve already simulated A and B. Now simulate 100,000 flips of coin C, where each has a 70% chance of coming up heads.

Use A, B, and C to estimate the probability that all three coins would come up heads.

Solution

# You've already simulated 100,000 flips of coins A and B
A <- rbinom(100000, 1, .4)
B <- rbinom(100000, 1, .2)

# Simulate 100,000 flips of coin C (70% chance of heads)
C <- rbinom(100000, 1, .7)

# Estimate the probability A, B, and C are all heads
mean(A&B&C==1)

FALSE [1] 0.05593

Solving for the Probability of A or B

If coins A and B are independent, and A has a 60% chance of coming up heads, and event B has a 10% chance of coming up heads, what is the probability either A or B will come up heads?

Hint: The probability of A or B happening (when A and B are independent, as they are here) is P(A) + P(B) - P(A) * P(B).

Simulating Probability of A or B

In the last exercise, you found that there was a 64% chance that either coin A (60% chance) or coin B (10% chance) would come up heads. Now you’ll confirm that answer using simulation.

Instructions

Use rbinom() to simulate 100,000 flips of coin A, each having a 60% chance of being heads.

Use rbinom() to simulate 100,000 flips of coin B, each having a 10% chance of being heads.

Use these to estimate the probability that A or B is heads.

Solution

# Simulate 100,000 flips of a coin with a 60% chance of heads
A <- rbinom(100000,1,.6)

# Simulate 100,000 flips of a coin with a 10% chance of heads
B <- rbinom(100000,1,.1)

# Estimate the probability either A or B is heads
mean(A|B==1)

FALSE [1] 0.63986

Probability Either Variable is Less Than or Equal to 4

Suppose X is a random Binom(10, .6) variable (10 flips of a coin with 60% chance of heads) and Y is a random Binom(10, .7) variable (10 flips of a coin with a 70% chance of heads), and they are independent.

What is the probability that either of the variables is less than or equal to 4?

Instructions

Simulate 100,000 draws from each of X (10 coins, 60% chance of heads) and Y (10 coins, 70% chance of heads) binomial variables, saving them as X and Y respectively.

Use these simulations to estimate the probability that either X or Y is less than or equal to 4.

Use the pbinom() function to calculate the exact probability that X is less than or equal to 4, then the probability that Y is less than or equal to 4.

Combine these two exact probabilities to calculate the exact probability that either variable is less than or equal to 4.

Solution

# Use rbinom to simulate 100,000 draws from each of X and Y
X <-rbinom(100000,10,.6) 
Y <- rbinom(100000,10,.7)

# Estimate the probability either X or Y is <= to 4
mean(X<=4|Y<=4)

FALSE [1] 0.20478

# Use pbinom to calculate the probabilities separately
prob_X_less <- pbinom(4,10,.6)
prob_Y_less <- pbinom(4, 10, .7)

# Combine these to calculate the exact probability either <= 4
prob_X_less+prob_Y_less-(prob_Y_less*prob_X_less)

FALSE [1] 0.2057164

Expected Value of Multiplying a Random Variable

If X is a binomial with size 50 and p = .4, what is the expected value of 3*X?

Hint: The expected value of a binomial is size * p, and the expected value of k * X is k * E[X].

Simulating Multiplying a Random Variable

In this exercise you’ll use simulation to confirm the rule you just learned about how multiplying a random variable by a constant effects its expected value.

Instructions

Simulate 100,000 draws of X, a binomial random variable with size 20 and p = .1. Save this as X

Use this simulation to estimate the expected value of X.

Use this simulation to estimate the expected value of 5*X, as well.

Solution

# Simulate 100,000 draws of a binomial with size 20 and p = .1
X <- rbinom(100000,20,.1)

# Estimate the expected value of X
mean(X)

FALSE [1] 2.00379

# Estimate the expected value of 5 * X
mean(5*X)

FALSE [1] 10.01895

Variance of a Multiplied Random Variable

In the last exercise you simulated X from a binomial with size 20 and p = .1 and now you’ll use this same simulation to explore the variance.

Instructions

Use this simulation to estimate the variance of X.

Estimate the variance of 5 * X

Solution

# X is simulated from 100,000 draws of a binomial with size 20 and p = .1
X <- rbinom(100000, 20, .1)

# Estimate the variance of X
var(X)

FALSE [1] 1.797931

# Estimate the variance of 5 * X
var(5*X)

FALSE [1] 44.94829

Solving for the Sum of Two Binomial Variables

If X is drawn from a binomial with size 20 and p = .3, and Y from size 40 and p = .1, what is the expected value (mean) of X + Y?

Hint: Compute the expected value of X and the expected value of Y separately, then add them together.

Simulating Adding Two Binomial Variables

In the last exercise, you found the expected value of the sum of two binomials. In this problem you’ll use a simulation to confirm your answer.

Instructions

Simulate 100,000 draws from X, a binomial with size 20 and p = .3, and Y, with size 40 and p = .1.

Use this simulation to estimate the expected value of X + Y.

Solution

# Simulate 100,000 draws of X (size 20, p = .3) and Y (size 40, p = .1)
X <-rbinom(100000,20,.3)
Y <-rbinom(100000,40,.1)

# Estimate the expected value of X + Y
mean(X+Y)

FALSE [1] 9.99454

Simulating Variance of Sum of Two Binomial Variables

In the last multiple choice exercise, you examined the expected value of the sum of two binomials. Here you’ll estimate the variance.

Instructions

Use your simulation of the variables X and Y to estimate the variance of X + Y. Use your simulation to estimate the variance of 3 * X + Y.

Solution

# Simulation from last exercise of 100,000 draws from X and Y
X <- rbinom(100000, 20, .3) 
Y <- rbinom(100000, 40, .1)

# Find the variance of X + Y
var(X+Y)

FALSE [1] 7.816989

# Find the variance of 3 * X + Y
var(3*X+Y)

FALSE [1] 41.64792

Bayesian Statistics

Updating

Suppose you have a coin that is equally likely to be fair (50% heads) or biased (75% heads). You then flip the coin 20 times and see 11 heads.

Without doing any math, which do you now think is more likely- that the coin is fair, or that the coin is biased?

Updating with Simulation

We see 11 out of 20 flips from a coin that is either fair (50% chance of heads) or biased (75% chance of heads). How likely is it that the coin is fair? Answer this by simulating 50,000 fair coins and 50,000 biased coins.

Instructions

Simulate 50,000 cases of flipping 20 coins from a fair coin (50% chance of heads), as well as from a biased coin (75% chance of heads). Save these variables as fair and biased respectively.

Find the number of fair coins where exactly 11/20 came up heads, then the number of biased coins where exactly 11/20 came up heads. Save them as fair_11 and biased_11 respectively.

Find the fraction of all coins that came up heads 11 times that were fair coins- this is the posterior probability that a coin with 11/20 is fair.

Solution

# Simulate 50000 cases of flipping 20 coins from fair and from biased
fair <-rbinom(50000,20,.5) 
biased <- rbinom(50000,20,.75)

# How many fair cases, and how many biased, led to exactly 11 heads?
fair_11 <- sum(fair==11)
biased_11 <- sum(biased==11)

# Find the fraction of fair coins that are 11 out of all coins that were 11
fair_11/(fair_11+biased_11)

FALSE [1] 0.8601563

Updating After 16 Heads

Suppose that when you flip a different coin (that could either be fair or biased) 20 times, you see 16 heads.

Without doing any math, which do you now think is more likely- that this coin is fair, or that it’s biased?

Updating with Simulation After 16 Heads

We see 16 out of 20 flips from a coin that is either fair (50% chance of heads) or biased (75% chance of heads). How likely is it that the coin is fair?

Instructions

Simulate 50,000 cases of flipping 20 coins from a fair coin (50% chance of heads), as well as from a biased coin (75% chance of heads). Save these variables as fair and biased respectively.

Find the number of fair coins where exactly 16/20 came up heads, then the number of biased coins where exactly 16/20 came up heads. Save them as fair_16 and biased_16 respectively.

Print the fraction of all coins that came up heads 16 times that were fair coins- this is the posterior probability that a coin with 16/20 is fair.

Solution

# Simulate 50000 cases of flipping 20 coins from fair and from biased
fair <- rbinom(50000,20,.5)
biased <- rbinom(50000,20,.75)

# How many fair cases, and how many biased, led to exactly 16 heads?
fair_16 <- sum(fair==16)
biased_16 <- sum(biased==16)

# Find the fraction of fair coins that are 16 out of all coins that were 16
fair_16/(fair_16+biased_16)

FALSE [1] 0.02384546

Updating with Priors

We see 14 out of 20 flips are heads, and start with a 80% chance the coin is fair and a 20% chance it is biased to 75%.

You’ll solve this case with simulation, by starting with a “bucket” of 10,000 coins, where 8,000 are fair and 2,000 are biased, and flipping each of them 20 times.

Instructions

Simulate 8,000 trials of flipping a fair coin 20 times and 2,000 trials of flipping a biased coin 20 times. Save them as fair_flips and biased_flips, respectively.

Find the number of cases that resulted in 14 heads from each coin, saving them as fair_14 and biased_14 respectively.

Find the fraction of all coins that resulted in 14 heads that were fair: this is an estimate of the posterior probability that the coin is fair.

Solution

# Simulate 8000 cases of flipping a fair coin, and 2000 of a biased coin
fair_flips <-rbinom(8000,20,.5)
biased_flips <-rbinom(2000,20,.75)

# Find the number of cases from each coin that resulted in 14/20
fair_14 <-sum(fair_flips==14)
biased_14 <-sum(biased_flips==14)

# Use these to estimate the posterior probability
fair_14/(fair_14+biased_14)

FALSE [1] 0.4791667

Updating with Three Coins

Suppose instead of a coin being either fair or biased, there are three possibilities: that the coin is fair (50% heads), low (25% heads), and high (75% heads). There is a 80% chance it is fair, a 10% chance it is biased low, and a 10% chance it is biased high.

You see 14/20 flips are heads. What is the probability that the coin is fair?

Instructions

Use the rbinom() function to simulate 80,000 draws from the fair coin, 10,000 draws from the high coin, and 10,000 draws from the low coin, with each draw containing 20 flips. Save them as flips_fair, flips_high, and flips_low, respectively.

For each of these types, compute the number of coins that resulted in 14. Save them as fair_14, high_14, and low_14, respectively.

Find the posterior probability that the coin was fair, by dividing the number of fair coins resulting in 14 from the total number of coins resulting in 14.

Solution

# Simulate 80,000 draws from fair coin, 10,000 from each of high and low coins
flips_fair <-rbinom(80000,20,.5) 
flips_high <- rbinom(10000,20,.75)
flips_low <- rbinom(10000,20,.25)

# Compute the number of coins that resulted in 14 heads from each of these piles
fair_14 <- sum(flips_fair==14)
high_14 <- sum(flips_high==14)
low_14 <- sum(flips_low==14)

# Compute the posterior probability that the coin was fair
fair_14/(fair_14+high_14+low_14)

FALSE [1] 0.6359348

Updating with Bayes Theorem

In this chapter, you used simulation to estimate the posterior probability that a coin that resulted in 11 heads out of 20 is fair. Now you’ll calculate it again, this time using the exact probabilities from dbinom(). There is a 50% chance the coin is fair and a 50% chance the coin is biased.

Instructions

Use the dbinom() function to calculate the exact probability of getting 11 heads out of 20 flips with a fair coin (50% chance of heads) and with a biased coin (75% chance of heads). Save them as probability_fair and probability_biased, respectively.

Use these to calculate the posterior probability that the coin is fair. This is the probability that you would get 11 from a fair coin, divided by the sum of the two probabilities.

Solution

# Use dbinom to calculate the probability of 11/20 heads with fair or biased coin
probability_fair <-dbinom(11,20,.5)
probability_biased <-dbinom(11,20,.75)

# Calculate the posterior probability that the coin is fair
probability_fair/(probability_fair+probability_biased)

FALSE [1] 0.8554755

Updating for Other Outcomes

In the last exercise, you solved for the probability that the coin is fair if it results in 11 heads out of 20 flips, assuming that beforehand there was an equal chance of it being a fair coin or a biased coin. Recall that the code looked something like:

probability_fair <- dbinom(11, 20, .5)
probability_biased <- dbinom(11, 20, .75)
probability_fair / (probability_fair + probability_biased)

Now you’ll find, using the dbinom() approach, the posterior probability if there were two other outcomes.

Instructions

Find the probability that a coin resulting in 14 heads out of 20 flips is fair.

Find the probability that a coin resulting in 18 heads out of 20 flips is fair.

Solution

# Find the probability that a coin resulting in 14/20 is fair
dbinom(14,20,.5)/(dbinom(14,20,.75)+dbinom(14,20,.5))

FALSE [1] 0.179811

# Find the probability that a coin resulting in 18/20 is fair
dbinom(18,20,.5)/(dbinom(18,20,.75)+dbinom(18,20,.5))

FALSE [1] 0.002699252

More Updating with Priors

Suppose we see 16 heads out of 20 flips, which would normally be strong evidence that the coin is biased. However, suppose we had set a prior probability of a 99% chance that the coin is fair (50% chance of heads), and only a 1% chance that the coin is biased (75% chance of heads).

You’ll solve this exercise by finding the exact answer with dbinom() and Bayes’ theorem. Recall that Bayes’ theorem looks like:

$Pr(fair|A) = \frac{Pr(A|fair)Pr(fair)}{Pr(A|fair)Pr(fair)+Pr(A|biased)Pr(biased)}$

Instructions

Use dbinom() to calculate the probabilities that a fair coin and a biased coin would result in 16 heads out of 20 flips.

Use Bayes’ theorem to find the posterior probability that the coin is fair, given that there is a 99% prior probability that the coin is fair.

Solution

# Use dbinom to find the probability of 16/20 from a fair or biased coin
probability_16_fair <-dbinom(16,20,.5)
probability_16_biased <-dbinom(16,20,.75)

# Use Bayes' theorem to find the posterior probability that the coin is fair
(.99*probability_16_fair)/(.99*probability_16_fair+.01*probability_16_biased)

FALSE [1] 0.7068775

Related Distributions

Approximating a Binomial to the Normal

Suppose you flipped 1000 coins, each with a 20% chance of being heads. What would be the mean and variance of the binomial distribution?

Hint: The variance of the binomial can be computed as n * p * (1 - p).

Simulating from the Binomial and the Normal

In this exercise you’ll see for yourself whether the normal is a reasonable approximation to the binomial by simulating large samples from the binomial distribution and its normal approximation and comparing their histograms.

Instructions

Generate 100,000 draws from the Binomial(1000, .2) distribution. Save this as binom_sample.

Generate 100,000 draws from the normal distribution that approximates this binomial distribution, using the rnorm() function. (Remember that rnorm() takes the mean and the standard deviation, which is the square root of the variance). Save this as normal_sample.

Compare the two distributions with the compare_histograms1() function. (Remember that this takes two arguments: the first and second vectors to compare).

Solutions

# Draw a random sample of 100,000 from the Binomial(1000, .2) distribution
binom_sample <- rbinom(100000,1000,.2)

# Draw a random sample of 100,000 from the normal approximation
normal_sample <- rnorm(100000,mean=200,sd=sqrt(1000*.2*(1-.2)))

# Compare the two distributions with the compare_histograms1 function
compare_histograms1(binom_sample, normal_sample)

Comparing the Cumulative Density of the Binonmial

If you flip 1000 coins that each have a 20% chance of being heads, what is the probability you would get 190 heads or fewer?

You’ll get similar answers if you solve this with the binomial or its normal approximation. In this exercise, you’ll solve it both ways, using both simulation and exact calculation.

Instructions

Use the simulated binom_sample (provided) from the last exercise to estimate the probability of getting 190 or fewer heads.

Use the simulated normal_sample to estimate the probability of getting 190 or fewer heads.

Calculate the exact probability of the binomial being <= 190 with pbinom().

Calculate the exact probability of the normal being <= 190 with pnorm().

Solution

# simulations from the normal and binomial distributions
binom_sample <- rbinom(100000, 1000, .2)
normal_sample <- rnorm(100000, 200, sqrt(160))

# Use binom_sample to estimate the probability of <= 190 heads
sum(binom_sample<=190)/100000

FALSE [1] 0.22748

# Use normal_sample to estimate the probability of <= 190 heads
sum(normal_sample<=190)/100000

FALSE [1] 0.21504

# Calculate the probability of <= 190 heads with pbinom
pbinom(190,1000,.2)

FALSE [1] 0.2273564

# Calculate the probability of <= 190 heads with pnorm
pnorm(190,200,sqrt(160))

FALSE [1] 0.2145977

Comparing the Distributions of the Normal and Binomial for Low n

When we flip a lot of coins, it looks like the normal distribution is a pretty close approximation. What about when we flip only 10 coins, each still having a 20% chance of coming up heads? Is the normal still a good approximation?

Instructions

Generate 100,000 draws from the Binomial(10, .2) distribution. Save this as binom_sample.

Generate 100,000 draws from the normal distribution that approximates this binomial distribution, using the rnorm() function. Save this as normal_sample.

Compare the two distributions with the compare_histograms1() function. (Remember that this takes two arguments: the two samples that are to be compared).

Solution

# Draw a random sample of 100,000 from the Binomial(10, .2) distribution
binom_sample <-rbinom(100000,10,.2)

# Draw a random sample of 100,000 from the normal approximation
normal_sample <- rnorm(100000,mean=2,sd=sqrt(1.6))

# Compare the two distributions with the compare_histograms1 function
compare_histograms1(binom_sample,normal_sample)

Approximating a Binomial with a Poisson

If you were drawing from a binomial with size = 1000 and p = .002, what would be the mean of the Poisson approximation?

Hint: How does the mean of the Poisson approximation relate to the mean of the binomial random variable?

Simulating from a Poisson and a Binomial

If we were flipping 100,000 coins that each have a .2% chance of coming up heads, you could use a Poisson(2) distribution to approximate it. Let’s check that through simulation.

Instructions

Generate 100,000 draws from the Binomial(1000, .002) distribution. Save it as binom_sample.

Generate 100,000 draws from the Poisson distribution that approximates this binomial distribution, using the rpois() function. Save it as poisson_sample.

Compare the two distributions with the compare_histograms1() function. (Remember that this takes two arguments: the two samples that are to be compared).

Solution

# Draw a random sample of 100,000 from the Binomial(1000, .002) distribution
binom_sample <- rbinom(100000,1000,.002)

# Draw a random sample of 100,000 from the Poisson approximation
poisson_sample <- rpois(100000,lambda=2)

# Compare the two distributions with the compare_histograms1 function
compare_histograms1(binom_sample,poisson_sample)

Density of the Poisson Distribution

In this exercise you’ll find the probability that a Poisson random variable will be equal to zero by simulating and using the dpois() function, which gives an exact answer.

Instructions

Simulate 100,000 draws from a Poisson distribution with a mean of 2.

Use this simulation to estimate the probability that a draw from this Poisson distribution will be 0.

Find the exact probability that a draw from a Poisson(2) distribution is zero, using the dpois() function.

Sum of Two Poisson Variables

One of the useful properties of the Poisson distribution is that when you add multiple Poisson distributions together, the result is also a Poisson distribution.

Here you’ll generate two random Poisson variables to test this.

Instructions

Simulate 100,000 draws from the Poisson(1) distribution, saving them as X.

Simulate 100,000 draws separately from the Poisson(2) distribution, and save them as Y.

Add X and Y together to create a variable Z.

We expect Z to follow a Poisson(3) distribution. Use the compare_histograms2() function to compare Z to 100,000 draws from a Poisson(3) distribution.

Solution

# Simulate 100,000 draws from Poisson(1)
X<-rpois(100000,1)

# Simulate 100,000 draws from Poisson(2)
Y<-rpois(100000,2)

# Add X and Y together to create Z
Z<-X+Y

# Use compare_histograms2 to compare Z to the Poisson(3)
compare_histograms2(Z,rpois(100000,3))

Waiting for the First Coin Flip

You’ll start by simulating a series of coin flips, and “waiting” for the first heads.

Instructions

Simulate 100 instances of flipping a single coin with a 20% chance of heads, and save it as the variable flips. (Thus, flips should be a vector of length 100).

Use which() to find the first case where a coin resulted in heads.

Solution

# Simulate 100 instances of flipping a 20% coin
flips <- rbinom(100,1,.20)

# Use which to find the first case of 1 ("heads")
which(flips==1)

FALSE  [1]   2   3   4   9  12  14  17  22  23  52  79  83  92  93 100

Using `replicate()` for Simulation

Use the replicate() function to simulate 100,000 trials of waiting for the first heads after flipping coins with 20% chance of heads. Plot a histogram of this simulation by calling qplot().

Instructions

Use replicate() to simulate 100,000 geometric trials. Copy and paste the expression given to you as your second argument to replicate().

Plot a histogram by calling qplot() on the replications.

Solution

# Existing code for finding the first instance of heads
which(rbinom(100, 1, .2) == 1)[1]

FALSE [1] 7

# Replicate this 100,000 times using replicate()
replications <-replicate(100000,which(rbinom(100, 1, .2) == 1)[1]) 

# Histogram the replications with qplot
qplot(replications)

FALSE `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Simulating from the Geometric Distribution

In this exercise you’ll compare your replications with the output of rgeom().

Instructions

Use the function rgeom() to simulate 100,000 draws from a geometric distributions with probability .2. Save this as geom_sample.

Compare replications and geom_sample with the compare_histograms3() function.

Solution

# Replications from the last exercise
replications <- replicate(100000, which(rbinom(100, 1, .2) == 1)[1])

# Generate 100,000 draws from the corresponding geometric distribution
geom_sample <- rgeom(100000,.2)

# Compare the two distributions with compare_histograms3
compare_histograms3(replications,geom_sample)

Probability of a Machine Lasting X Days

A new machine arrives in a factory. This type of machine is very unreliable: every day, it has a 10% chance of breaking permanently. How long would you expect it to last?

Notice that this is described by the cumulative distribution of the geometric distribution, and therefore the pgeom() function. pgeom(X, .1) would describe the probability that there are X working days before the day it breaks (that is, that it breaks on day X + 1).

Instructions

Use pgeom() to find the probability that the machine breaks on the 5th day or earlier.

Use pgeom() to find the probability that the machine is still working by the end of the 20th day.

Solution

# Find the probability the machine breaks on 5th day or earlier
pgeom(4,.1)

FALSE [1] 0.40951

# Find the probability the machine is still working on 20th day
1-pgeom(19,.1)

FALSE [1] 0.1215767

Graphing the Probability that a Machine Still Works

If you were a supervisor at the factory with the unreliable machine, you might want to understand how likely the machine is to keep working over time. In this exercise, you’ll plot the probability that the machine is still working across the first 30 days.

Instructions

Calculate a vector of probabilities of whether the machine is still working on each day from day 1 to 30, and save it as still_working. You can do this with a single call to pgeom() by passing in a vector of numbers as the first argument. The machine has a 10% chance of breaking each day.

Run the command qplot(still_working) to graph the probability of the machine still working on each of the first 30 days, with the day on the x-axis and the probability on the y-axis.

Solution

# Calculate the probability of machine working on day 1-30
still_working <-1-pgeom(0:29,.1)

# Plot the probability for days 1 to 30
qplot(1:30, still_working)

Foundations of Probability in R

Daniel Trombley

November 06, 2017

The Binomial Distribution

Simulating Coin Flips

Instructions

Solution

Simulating Draws from a Binomial

Instructions

Solution

Calculating Density of a Binomial

Instructions

Solution

Calculating Cumulative Density of a Binomial

Instructions

Solution

Varying the Number of Trials

Instructions

Solution

Calculating the Expected Value

Instructions

Solution

Calculating the Variance

Instructions

Solution

Laws of Probability

Solving for Probability of A and B

Simulating the Probability of A and B

Instructions

Solution

Simulating the Probability of A, B, and C

Instructions

Solution

Solving for the Probability of A or B

Simulating Probability of A or B

Instructions

Solution

Probability Either Variable is Less Than or Equal to 4

Instructions

Solution

Expected Value of Multiplying a Random Variable

Simulating Multiplying a Random Variable

Instructions

Solution

Variance of a Multiplied Random Variable

Instructions

Solution

Solving for the Sum of Two Binomial Variables

Simulating Adding Two Binomial Variables

Instructions

Solution

Simulating Variance of Sum of Two Binomial Variables

Instructions

Solution

Bayesian Statistics

Updating

Updating with Simulation

Instructions

Solution

Updating After 16 Heads

Updating with Simulation After 16 Heads

Instructions

Solution

Updating with Priors

Instructions

Solution

Updating with Three Coins

Instructions

Solution

Updating with Bayes Theorem

Instructions

Solution

Updating for Other Outcomes

Instructions

Solution

More Updating with Priors

Instructions

Solution

Related Distributions

Approximating a Binomial to the Normal

Using `replicate()` for Simulation